Add LLM Integration Tests #603

devin-ai-integration · 2024-12-24T00:23:09Z

🔍 Review Summary

Purpose:

Enhance the testing framework by integrating tests for multiple LLM providers.

Changes:

Configuration: Introduced environment variables for API keys in GitHub workflow and initialized various LLM providers.
Enhancement: Improved async handling and error management for AI21, Groq, Litellm, and Mistral providers.
Test: Expanded testing to include comprehensive integration tests for all LLM providers, covering both synchronous and asynchronous call patterns.
Dependencies: Updated tox.ini to include necessary test dependencies for new providers.

Impact:

Significantly enhances the reliability and coverage of our testing infrastructure, improving code quality and system integrity.

Original Description

Adds integration tests for Anthropic, Cohere, Groq, Litellm, Mistral, AI21

This PR adds comprehensive integration tests for multiple LLM providers:

Anthropic (Claude)
Cohere
Groq
Litellm
Mistral
AI21

Each test verifies four types of calls:

Synchronous (non-streaming)
Synchronous (streaming)
Asynchronous (non-streaming)
Asynchronous (streaming)

The PR also:

Adds necessary test dependencies to tox.ini
Updates GitHub workflow with required environment variables
Adds debug prints for API key and session verification
Enables LLM call instrumentation in tests

Link to Devin run: https://app.devin.ai/sessions/e034afaf9cfb45529f3b652de116cf0e

- Add integration tests for Anthropic, Cohere, Groq, Litellm, Mistral, AI21 - Add test dependencies to tox.ini - Update GitHub workflow with required environment variables - Add debug prints for API key and session verification - Enable LLM call instrumentation in tests Co-Authored-By: Alex Reibman <[email protected]>

devin-ai-integration · 2024-12-24T00:23:13Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add "(aside)" to your comment to have me ignore it.
Look at CI failures and help fix them

⚙️ Control Options:

Disable automatic comment and CI monitoring

Co-Authored-By: Alex Reibman <[email protected]>

entelligence-ai-pr-reviews · 2024-12-24T00:27:07Z

Walkthrough

This update enhances the testing framework by integrating tests for multiple LLM providers, including Anthropic, Cohere, Groq, Litellm, Mistral, and AI21. Key changes include:

Environment Setup: Added API key environment variables in the GitHub workflow.
Provider Configuration: Initialized and configured new LLM providers in agentops/__init__.py.
Provider Enhancements: Refactored AI21, Groq, Litellm, and Mistral providers for better async handling and error management.
Testing: Comprehensive integration tests were added for all new providers, ensuring coverage of synchronous and asynchronous call patterns. OpenAI tests were updated to use the gpt-3.5-turbo model.
Dependencies: Updated tox.ini to include necessary test dependencies for the new providers.

These changes ensure robust testing and validation of LLM interactions across various providers.

Changes

File(s)	Summary
`.github/workflows/python-testing.yml`	Added environment variables for API keys of new LLM providers.
`agentops/__init__.py`	Initialized new LLM providers and configured them for testing.
`agentops/llms/providers/`	Refactored and enhanced LLM providers (AI21, Groq, Litellm, Mistral) with improved async handling, error management, and response handling.
`tests/`	Added comprehensive integration tests for Anthropic, Cohere, Groq, Litellm, Mistral, and AI21 providers covering all call patterns. Updated OpenAI tests to use `gpt-3.5-turbo`.
`tox.ini`	Included test dependencies for new LLM providers.

🔗 Related PRs

GH Actions: fix the pipeline #564: The PR updates GitHub Actions workflows to utilize the absolute uv managed runtime, improves the static analysis workflow, and fixes Python test actions to ensure proper environment management.
version bump #546: This pull request outlines changes made and includes sections for a description and testing validation.
deps: remove packaging; unpinned & ranged versioning #561: The pull request addresses several issues, clarifies that packaging is an implicit dependency of setuptools, restores a loose dependency on psutil, and caps all dependencies at their latest stable versions for security and performance reasons.
fix tests #562: The pull request addresses issues with newer PsUtil versions failing tests by removing unused arguments in tuples.

Instructions

Emoji Descriptions:

⚠️ Potential Issue - May require further investigation.
🔒 Security Vulnerability - Fix to ensure system safety.
💻 Code Improvement - Suggestions to enhance code quality.
🔨 Refactor Suggestion - Recommendations for restructuring code.
ℹ️ Others - General comments and information.

Interact with the Bot:

Send a message or request using the format:
@bot + *your message*

Example: @bot Can you suggest improvements for this code?

Help the Bot learn by providing feedback on its responses.
@bot + *feedback*

Example: @bot Do not comment on `save_auth` function !

Execute a command using the format:

@bot + */command*

Example: @bot /updateCommit

Available Commands:

/updateCommit ✨: Apply the suggested changes and commit them (or Click on the Github Action button to apply the changes !)
/updateGuideline 🛠️: Modify an existing guideline.
/addGuideline ➕: Introduce a new guideline.

Tips for Using @bot Effectively:

Specific Queries: For the best results, be specific with your requests.
🔍 Example: @bot summarize the changes in this PR.
Focused Discussions: Tag @bot directly on specific code lines or files for detailed feedback.
📑 Example: @bot review this line of code.
Managing Reviews: Use review comments for targeted discussions on code snippets, and PR comments for broader queries about the entire PR.
💬 Example: @bot comment on the entire PR.

Need More Help?

📚 Visit our documentation for detailed guides on using Entelligence.AI.
🌐 Join our community to connect with others, request features, and share feedback.
🔔 Follow us for updates on new features and improvements.

entelligence-ai-pr-reviews · 2024-12-24T00:27:08Z

tests/litellm_handlers/test_litellm_integration.py

+    def sync_stream():
+        litellm.api_key = os.getenv("ANTHROPIC_API_KEY")
+        stream_result = litellm.completion(
+            model="anthropic/claude-3-opus-20240229",
+            messages=[{"role": "user", "content": "Hello from sync streaming"}],
+            stream=True,
+        )
+        for chunk in stream_result:
+            if hasattr(chunk, 'choices') and chunk.choices[0].delta.content:
+                pass


🤖 Bug Fix:

Handle Stream Content in sync_stream
Ensure sync_stream processes or stores stream content to avoid logical errors.

🔧 Suggested Code Diff:

for chunk in stream_result: if hasattr(chunk, 'choices') and chunk.choices[0].delta.content: # Process or store the content here print(chunk.choices[0].delta.content)

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def sync_stream():

litellm.api_key = os.getenv("ANTHROPIC_API_KEY")

stream_result = litellm.completion(

model="anthropic/claude-3-opus-20240229",

messages=[{"role": "user", "content": "Hello from sync streaming"}],

stream=True,

)

for chunk in stream_result:

if hasattr(chunk, 'choices') and chunk.choices[0].delta.content:

pass

import os

import litellm

def sync_stream():

litellm.api_key = os.getenv("ANTHROPIC_API_KEY")

stream_result = litellm.completion(

model="anthropic/claude-3-opus-20240229",

messages=[{"role": "user", "content": "Hello from sync streaming"}],

stream=True,

)

for chunk in stream_result:

if hasattr(chunk, 'choices') and chunk.choices[0].delta.content:

# Process or store the content here

print(chunk.choices[0].delta.content)

entelligence-ai-pr-reviews · 2024-12-24T00:29:15Z

tests/openai_handlers/test_openai_integration.py

    def sync_stream():
        client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        stream_result = client.chat.completions.create(
-            model="gpt-4o-mini",
+            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello from sync streaming"}],


⚠️ Potential Issue:

Model Change in OpenAI API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call is significant and could impact the application's behavior and performance. It's crucial to verify that 'gpt-3.5-turbo' meets the requirements previously fulfilled by 'gpt-4o-mini'. If this change is intentional, ensure that all related documentation and test cases are updated accordingly. If not, consider reverting to 'gpt-4o-mini' or selecting a more suitable model.

🔧 Suggested Code Diff:

- model="gpt-4o-mini", + model="gpt-3.5-turbo",

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def sync_stream():

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

stream_result = client.chat.completions.create(

model="gpt-4o-mini",

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from sync streaming"}],

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

stream_result = client.chat.completions.create(

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from sync streaming"}],

stream=True

)

entelligence-ai-pr-reviews · 2024-12-24T00:29:16Z

tests/openai_handlers/test_openai_integration.py

    async def async_no_stream():
        client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        await client.chat.completions.create(
-            model="gpt-4o-mini",
+            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello from async no stream"}],


⚠️ Potential Issue:

Model Change Verification in OpenAI Integration Test
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the API call could impact the test's functionality and expected outcomes. It is crucial to verify that 'gpt-3.5-turbo' is the intended model for this test. If the change was unintentional, revert to 'gpt-4o-mini'. Ensure that the test requirements align with the capabilities of the new model to avoid unexpected results.

🔧 Suggested Code Diff:

- model="gpt-4o-mini", + model="gpt-3.5-turbo",

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

async def async_no_stream():

client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

await client.chat.completions.create(

model="gpt-4o-mini",

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from async no stream"}],

import os

from openai import AsyncOpenAI

async def test_openai_integration():

client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

await client.chat.completions.create(

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from async no stream"}],

)

entelligence-ai-pr-reviews · 2024-12-24T00:29:17Z

tests/openai_handlers/test_openai_integration.py


    async def async_stream():
        client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        async_stream_result = await client.chat.completions.create(
-            model="gpt-4o-mini",
+            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello from async streaming"}],
            stream=True,
        )
        async for _ in async_stream_result:


⚠️ Potential Issue:

Model Change in API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the API call could lead to different outputs or performance issues. It's crucial to verify that 'gpt-3.5-turbo' meets the requirements previously fulfilled by 'gpt-4o-mini'. If this change is intentional, ensure that all related documentation and tests are updated to reflect this modification. This will help maintain consistency and avoid potential confusion or errors in the future.

🔧 Suggested Code Diff:

- model="gpt-4o-mini", + model="gpt-3.5-turbo",

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

async def async_stream():

client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

async_stream_result = await client.chat.completions.create(

model="gpt-4o-mini",

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from async streaming"}],

stream=True,

)

async for _ in async_stream_result:

import os

from openai import AsyncOpenAI

async def test_async_openai_integration():

client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

async_stream_result = await client.chat.completions.create(

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from async streaming"}],

stream=True,

)

async for _ in async_stream_result:

pass

Co-Authored-By: Alex Reibman <[email protected]>

entelligence-ai-pr-reviews · 2024-12-24T00:34:29Z

tests/openai_handlers/test_openai_integration.py

    def sync_no_stream():
        client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        client.chat.completions.create(
-            model="gpt-4o-mini",
+            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello from sync no stream"}],


⚠️ Potential Issue:

Verify Model Change in OpenAI API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call could impact the test's behavior and results. It is crucial to confirm that this modification is intentional and aligns with the test's objectives. If the change is deliberate, ensure that the test expectations are updated to accommodate any differences in model behavior or output. If not, revert to the original model to maintain test integrity.

🔧 Suggested Code Diff:

def sync_no_stream(): - client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) - client.chat.completions.create( - model="gpt-4o-mini", + client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) + client.chat.completions.create( + model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello from sync no stream"}], )

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def sync_no_stream():

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

client.chat.completions.create(

model="gpt-4o-mini",

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from sync no stream"}],

def sync_no_stream():

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

client.chat.completions.create(

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from sync no stream"}],

)

📜 Guidelines

Markdown:
• Use fenced code blocks and specify language when applicable
Python:
• Use f-strings or format methods for string formatting

entelligence-ai-pr-reviews · 2024-12-24T00:34:30Z

tests/openai_handlers/test_openai_integration.py


    def sync_stream():
        client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        stream_result = client.chat.completions.create(
-            model="gpt-4o-mini",
+            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello from sync streaming"}],


⚠️ Potential Issue:

Review Change in OpenAI Model Version
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call is significant and should be carefully reviewed. This modification can impact the test's behavior and results, as different models may have varying capabilities and performance characteristics. Ensure that 'gpt-3.5-turbo' aligns with the test's objectives and does not introduce regressions. Additionally, update any related documentation or test expectations to reflect this change. Verify that the new model meets the requirements of the integration test, especially in terms of output consistency and performance.

🔧 Suggested Code Diff:

- model="gpt-4o-mini", + model="gpt-3.5-turbo",

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def sync_stream():

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

stream_result = client.chat.completions.create(

model="gpt-4o-mini",

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from sync streaming"}],

def sync_stream():

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

stream_result = client.chat.completions.create(

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": "Hello from sync streaming"}],

stream=True

)

entelligence-ai-pr-reviews · 2024-12-24T00:34:32Z

tests/openai_handlers/test_openai_integration.py


    async def async_stream():
        client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        async_stream_result = await client.chat.completions.create(
-            model="gpt-4o-mini",
+            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello from async streaming"}],
            stream=True,
        )
        async for _ in async_stream_result:


⚠️ Potential Issue:

Model Change in async_stream Function
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the async_stream function could impact the function's behavior and output. It's crucial to ensure that 'gpt-3.5-turbo' meets the same requirements and expectations as 'gpt-4o-mini'. This change might introduce unexpected behavior or performance differences.

Actionable Steps:

Review the requirements and expected outputs for the async_stream function.

Conduct thorough testing to verify that 'gpt-3.5-turbo' produces the desired results.

Ensure no regressions are introduced with this model change.

This will help maintain the integrity and performance of the integration test.

Co-Authored-By: Alex Reibman <[email protected]>

entelligence-ai-pr-reviews · 2024-12-24T00:39:22Z

tests/openai_handlers/test_openai_integration.py

    def sync_no_stream():
        client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        client.chat.completions.create(
-            model="gpt-4o-mini",
+            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello from sync no stream"}],


⚠️ Potential Issue:

Model Change Verification Required
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call could impact the application's behavior and output. It is crucial to verify if this change aligns with the application's requirements and expected outcomes. If the change is intentional, ensure that all related documentation and tests are updated to reflect this modification. If not, consider reverting to the original model or selecting a more suitable alternative.

tests/openai_handlers/test_openai_integration.py

Co-Authored-By: Alex Reibman <[email protected]>

…test Co-Authored-By: Alex Reibman <[email protected]>

tests/openai_handlers/test_openai_integration.py

Co-Authored-By: Alex Reibman <[email protected]>

tests/openai_handlers/test_openai_integration.py

Co-Authored-By: Alex Reibman <[email protected]>

agentops/llms/providers/groq.py

tests/litellm_handlers/test_litellm_integration.py

tests/openai_handlers/test_openai_integration.py

Co-Authored-By: Alex Reibman <[email protected]>

agentops/llms/providers/groq.py

tests/openai_handlers/test_openai_integration.py

Co-Authored-By: Alex Reibman <[email protected]>

tests/openai_handlers/test_openai_integration.py

Co-Authored-By: Alex Reibman <[email protected]>

tests/openai_handlers/test_openai_integration.py

…ehavior Co-Authored-By: Alex Reibman <[email protected]>

…sync, create_stream, create_stream_async) Co-Authored-By: Alex Reibman <[email protected]>

Co-Authored-By: Alex Reibman <[email protected]>

- Remove try-except blocks to improve debugging - Add blank lines after imports for consistent formatting - Keep error handling minimal and explicit Devin Run: https://app.devin.ai/sessions/e034afaf9cfb45529f3b652de116cf0e Co-Authored-By: Alex Reibman <[email protected]>

Fix style issues in test files

5b39eda

Co-Authored-By: Alex Reibman <[email protected]>